-
Notifications
You must be signed in to change notification settings - Fork 537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove TE from dockerfile and instead add as optional dependency #1605
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Might be careful about this. Changing how the Dockerfile in Foundry is built will almost definitely result in cache misses for how images are built. You can see here in the Docker job is taking 21 minutes instead of the standard 41 seconds. Edit: might have been wrong about reason why docker job took 21 minutes, could be because of this pr -- lgtm pending that TE version is correct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that the version now pinned is the same/works equivalently to the commit we had pinned?
:p also how did image only take 21 minutes to install, in DLE TE takes ~25 min to install alone |
@dakinggg it should work the same, but is an upgrade from what we had pinned before. Do you want me to run the fp8 regression tests off of the image here? @KuuCi yeah i'd expect this to result in cache misses because I'm taking out the TE install from the docker image and it's instead getting installed as a foundry dependency. I think with the pypi release they're prebuilding some parts of the package which makes it faster, but there's also some additional things they do to get around build isolation & supporting multiple frameworks |
@snarayan21 yes please |
@dakinggg all 3 regression tests are deterministically the same with new TE (loss points are overlapping in these graphs below) -- |
See title
@dakinggg in terms of not breaking workflows, should we also add this to the gpu dep group?
Installed correctly in docker image: